TD-∆π: A Model-Free Algorithm for Efficient Exploration
نویسندگان
چکیده
We study the problem of finding efficient exploration policies for the case in which an agent is momentarily not concerned with exploiting, and instead tries to compute a policy for later use. We first formally define the Optimal Exploration Problem as one of sequential sampling and show that its solutions correspond to paths of minimum expected length in the space of policies. We derive a model-free, local linear approximation to such solutions and use it to construct efficient exploration policies. We compare our model-free approach to other exploration techniques, including one with the best known PAC bounds, and show that ours is both based on a well-defined optimization problem and empirically efficient.
منابع مشابه
Augmented Downhill Simplex a Modified Heuristic Optimization Method
Augmented Downhill Simplex Method (ADSM) is introduced here, that is a heuristic combination of Downhill Simplex Method (DSM) with Random Search algorithm. In fact, DSM is an interpretable nonlinear local optimization method. However, it is a local exploitation algorithm; so, it can be trapped in a local minimum. In contrast, random search is a global exploration, but less efficient. Here, rand...
متن کاملAn algorithm for the anchor points of the PPS of the CCR model
Anchor DMUs are a new class in the general classification of Decision Making Units (DMUs) in Data Envelopment Analysis (DEA). An anchor DMU in DEA is an extreme-efficient DMU that defines the transition from the efficient frontier to the free-disposability part of the boundary of the Production Possibility Set (PPS). In this paper, the anchor points of the PPS of the CCR model are investigated....
متن کاملTrue Online Emphatic TD($\lambda$): Quick Reference and Implementation Guide
This document is a guide to the implementation of true online emphatic TD(λ), a model-free temporal-difference algorithm for learning to make long-term predictions which combines the emphasis idea (Sutton, Mahmood & White 2015) and the true-online idea (van Seijen & Sutton 2014). The setting used here includes linear function approximation, the possibility of off-policy training, and all the ge...
متن کاملAn improved opposition-based Crow Search Algorithm for Data Clustering
Data clustering is an ideal way of working with a huge amount of data and looking for a structure in the dataset. In other words, clustering is the classification of the same data; the similarity among the data in a cluster is maximum and the similarity among the data in the different clusters is minimal. The innovation of this paper is a clustering method based on the Crow Search Algorithm (CS...
متن کاملCalculation of One-dimensional Forward Modelling of Helicopter-borne Electromagnetic Data and a Sensitivity Matrix Using Fast Hankel Transforms
The helicopter-borne electromagnetic (HEM) frequency-domain exploration method is an airborne electromagnetic (AEM) technique that is widely used for vast and rough areas for resistivity imaging. The vast amount of digitized data flowing from the HEM method requires an efficient and accurate inversion algorithm. Generally, the inverse modelling of HEM data in the first step requires a precise a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012